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ABSTRACT 

The expected data rate produced by the Low Frequency Instrument (LFI) 
planned to fly on the ESA Planck mission in 2007, is over a factor 8 larger than 
the bandwidth allowed by the spacecraft transmission system to download the 
LFI data. We discuss the apphcation of lossless compression to Planck/LFI data 
streams in order to reduce the overall data flow. We perform both theoretical 
analysis and experimental tests using realistically simulated data streams in 
order to fix the statistical properties of the signal and the maximal compression 
rate allowed by several lossless compression algorithms. We studied the influence 
of signal composition and of acquisition parameters on the compression rate Cr 
and develop a semiempirical formalism to account for it. The best performing 
compressor tested up to now is the arithmetic compression of order 1, designed 
for optimizing the compression of white noise like signals, which allows an overall 
compression rate Cr = 2.65 ± 0.02. We find that such result is not improved 
by other lossless compressors, being the signal almost white noise dominated. 
Lossless compression algorithms alone will not solve the bandwidth problem but 
needs to be combined with other techniques. 
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1. Introduction and Scanning Strategy 



The Planck satellite (formerly COBRAS/SAMBA, [Bersanelli et al. (1996)|) , which is 



planned to be launched in 2007, will produce full sky CMB maps with high accuracy and 
resolution over a wide range of frequencies ([Mandolesi et al. (1998a)| , Puget et al. (19981) . 
Table summarizes the basic properties of LFI aboard Planck. The reported sensitivities 
per resolution element - i.e. a squared pixel with side equal to the Full Width at Half 
Maximum (FWHM) extent of the beam in terms of antenna temperature, represents the 
goals of LFI for 14 months of routine scientific operations) as recently revised by the LFI 
Consortium ( Mandolesi et al. (1999) ). 



The limited bandwidth reserved to the downlink of scientific data calls for huge 
lossless compression, theoretical upper limit being about four ( [Maris et al. (199^ . Careful 



simulations are demanded to quantify the capability of true compressors for "realistic" 
synthetic data and improve the theoretical analysis, including CMB signal (monopole, 
dipole and anisotropics), foregrounds and instrumental noise. 

During the data acquisition phase the Planck satellite will rotate at a rate of one circle 
per minute around a given spin axis that changes its direction every hour (of 2.5' on the 
ecliptic plane in the case of simple scanning strategy), thus observing the same circle on 
the sky for 60 consecutive times ([Mandolesi et al. (1998a)] , [Mandolesi et al. (1998b)D . LFI 



will produce continuous data streams of temperature differences between the microwave 
sky and a set of on-board reference sources; both differential measurements and reference 
source temperatures must be recorded. 



The LFI Proposal assumes a sampling time Ts ~ 7 msec for each detector ([Mandolesi 



et al. (1998a)| ), thus calling for a typical data rate of ~ 260 Kb/sec, while the allocated 



bandwidth to download Planck data to ground is in total ~ 60 Kb/sec. Assuming the total 
bandwidth to be equally spht between instruments, ~ 30 Kb/sec on the average would be 
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assigned to LFI asking for a compression of about a factor 8.4. Data have to be downloaded 
without information losses and by minimizing scientific processing on board. 

A possible solution would be to adapt the sampling rate to the angular resolution 
specific for each frequency. This should allow to save about up to a factor ^ 9 for the 30 
GHz channel, but since only ^ 7% of the samples come from such channel (see table [l|) the 
overall reduction in the final data rate would be ~ 17%. 

On the other hand, it is unlikely that the bandwidth for the downlink channel may be 
enhanced to solve the bandwidth problem, since the ground facilities are shared between 
different missions and there is the need to minimize possible cross-talks between the 
instrument and the communication system. 

With the aim of optimizing of the transmission bandwidth dedicated to the downlink 
of LFI data from the Planck spacecraft to the FIRST/Planck Ground Segment, we analyze 
in detail the role that can be played by lossless compression of LFI data before they are 
sent to Earth. 

We apply different compression algorithms to suitable sets of Planck LFI simulated data 
streams generated by considering different combinations of astrophysical and instrumental 
signals and for different instrumental characteristics and detection electronics. 

The first considered contribution is that introduced by receiver noise: we consider 
here the case of pure white noise and of white noise coupled to 1// noise with different 
knee frequencies. The reference load temperature is assumed to be 20 K for present tests; 
because of the strong dependence of the 1// noise on the load temperature, this can be 
considered a worst case, since the actual baseline reference load is of 4 K. 

Different sky signal sources are subsequently added to the receiver noise: CMB 
fluctuations, CMB dipole. Galaxy emission and extragalactic point sources. The signal 
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from the different sky components are convolved with the corresponding antenna pattern 
shapes, assumed to be symmetric and gaussian with the FWHM reported in Table 1. 

We generate simulated data streams at the two extreme frequency channels, 30 GHz 
and 100 GHz and consider data streams with different time lengths. 

Regarding the detection electronics, we explore different signal offset and scaling. 

The large number of above combinations was systematically explored using an automated 
program generator as described by Maris fc Staniszkis(1998) 



In Section^ we characterize quantitatively the LFI signal component by component. Section 
^ we discuss how the acquisition chain is modeled to perform compression simulations. A 
theoretical analysis of the compression efficiency is presented in section ^ While section 
^ is devoted to the analysis of the signal statistics. The subject of quantization error is 
illustrated in section The experimental protocol and results about compression are 
reported in section ^ Further constraints on the on-board data compression are reported 
in section ^ A proposal for an alternative coding method is made in section ^ The overall 
compression rate is estimated in section 10. Conclusions are in section |Tl|. Appendix ^ is 



included to further illustrate the estimation of the overall compression rate. 



2. Characterization of Planck/LFI signal components 



The simulated cosmological and astrophysical components are generated according to 
the methods described in [Burigana et al. ( 1998b )| and the data stream and noise generation 



as in Burigana et al. (1997b) , Seiffert et al. (1997) and Maino et al. (1999] . We summarize 
here below the basic points. 



Modeling the CMB pattern - The CMB monopole and dipole have been generated 
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by using the Lorentz invariance of photon distribution functions, rj, in the phase space 
(Compton-Getting effect): rjobsiyohs,^) = rjcMBiycMB) , where Vobs is the observation 
frequency, vcmb = i^o6s(l + /5 x n)/^/l — [3'^ is the corresponding frequency in the CMB rest 
frame, n is the unit vector of the photon propagation direction and (3 = v/c the observer 
velocity. A blackbody spectrum at Tq = 2.725 K ([Mather et ah (1999)|) is assumed for 
7]. For gaussian models, the CMB anisotropics at / > 2 can be simulated by following 
the standard spherical harmonic expansion (see, e.g., [Burigana et al. (1998a)| or by using 
FFT (Fast Fourier Transform) techniques which take advantage of equatorial pixelisations 
( IMuciaccia et al. (1997) 1 ) ). 

• Modeling the Galaxy emission - The Haslam map at 408 MHz ( Haslam et al. (1982) ) 
is the only full-sky map currently available albeit large sky sampled at 1420 MHz 

( [Reich fc Reich (1986)| ) and at 2300 MHz ([Jonas et al. (1998]|) . To clean these maps from 
free-free emission we use a 2.7 GHz compilation of ~ 7000 HII sources ([Witebsky (1978][ ), 
private communication) at resolution of ~ 1°. They are subtracted for modelling the diffuse 
components and then re-added to the final maps. We use a spectral index (3ff = 2.1 from 
2.7 to 1 GHz and jSff = below 1 GHz. We then combine the synchrotron maps producing 
a spectral index map between 408-2300 MHz with a resolution of ^2° -^ 3° (< Psync >~ 2.8). 
This spectral index map is used to scale the synchrotron component down to ~ 10 GHz. 
In fact, for typical (local) values of the galactic magnetic field (~ 2.5/iG), the knee in the 
electron energy spectrum in cosmic rays (~ 15 Gev) corresponds to ~ 10 GHz ( piatania 
et al. (1998)[ ). From the synchrotron map obtained at 10 GHz and the DMR 31.5 GHz map 
we derive a high frequency spectral index map for scaling the synchrotron component up to 
Planck frequencies. These maps have a poor resolution and the synchrotron structure needs 
to be extrapolated to Planck angular scales. An estimate of the synchrotron angular power 
spectrum and of its spectral index, 7 (Q oc Z""^), has been provided by [Lasenby et al. (19"98 



we used 7 = 3 for the angular structure extrapolation ( [Burigana et al. (1998a)[) . Schlegel 



- 7- 



( [Schlegel et al. (1998)| ) provided a map of dust emission at lOO/xm merging the DIRBE and 
IRAS results to produce a map with IRAS resolution 7') but with DIRBE calibration 
quality. They also provided a map of dust temperature, T^, by adopting a modified 
blackbody emissivity law, Ii, oc By(Td)i'°', with a = 2. This can be used to scale the dust 
emission map to Planck frequencies using the dust temperature map as input for the Bi^{Ta) 
function. Unfortunately the dust temperature map has a resolution of ~ 1°; again, we use 
an angular power spectrum Ci oc l^^ to scale the dust skies to the Planck proper resolution. 
Merging maps at different frequencies with different instrumental features and potential 
systematics may introduce some internal inconsistencies. More data on diffuse galactic 
emission, particularly at low frequency, would be extremely important. 

• Modeling the extragalactic source fluctuations - The simulated maps of point 
sources have been created by an all-sky Poisson distribution of the known populations 
of extragalactic sources in the 10~^ < 5'(z/) < 10 Jy flux range exploiting the number 



counts of Toffolatti et al. (1998) and neglecting the effect of clustering of sources. The 



number counts have been calculated by adopting the Panese et al. (1987)| evolution model 



of radio selected sources and an average spectral index a = for compact sources up to 
~ 200 GHz and a break to a = 0.7 at higher frequencies (see [impey fc INeugebauer (19SH] 



Ue Zotti fc Ibholatti (1998)| ), and by the model C of [b'ranceschini et al. (1994)| updated 



as in [Burigana et al. (1997a)| , to account for the isotropic sub-mm component estimated 



by Puget et al. (1996]| and pixsen et al. (1996)| . At bright fluxes, far-IR selected sources 



should dominate the number counts at High Frequency Instrument (HFI) channels for 
z/>300 GHz, whereas radio selected sources should dominate at lower frequencies ( |Toftblatt 
et al. (1998)1) . 



• Instrumental noise - The white noise depends on instrumental performances 
(bandwidth Az/, system temperature Tsys), on the observed sky signal, Tg^y, dominated by 
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CBM monopole, and on the considered integration time, r, according to: 



AT 



V T 



Under certain idealistic assumptions, Purigana et al. (1997b)| and [Seiffert et al. (1997)| 



provide analytical estimates for the knee frequency, fk, of LFI radiometers; it is predicted 
to critically depend also on the load temperature, Tioad, according to: 

A = ^(l-r)M I' , (2) 

" \ sys ~r -t sky ) 

where r = iTsky + Tgys) / {Tioad + Tgys) and A is a constant, depending on the state of 
art of radiometer technology, which has to be minimized for reducing via hardware the 
knee frequency (current estimates are A ~ 1.8 x 10^^ for 30 and 44 GHz radiometers and 
A ~ 2.5 X 10-5 for 70 and 100 GHz). 



Recent experimental results from Seiffert (private communication [Seiffert (1999)|) show 



knee frequency values of this order of magnitude, confirming that the present state of art of 
the radiometer technology is close to reach the ideal case. 

A pure white noise stream can be easily generated by employed well tested random 
generator codes and normalizing their output to the white noise level AT^n- A noise stream 
which takes into account both white noise and 1// noise can be generated by using FFT 
methods. After generating a realisation of the real and imaginary part of the Fourier 
coefficients with spectrum defined Snoiseif) oc (1 + fk/f), we transform them and obtain 
a real noise stream which has to be normalized to the white noise level AT„n ( |lVlaino 
8t al. (1999)1 ). 



• Modeling the observed signal - We produce full sky maps, Tsky, by adding the 
antenna temperatures from CMB, Galaxy emission and extragalactic source fluctuations. 
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Planck will perform differential measurements and not absolute temperature observations; 

we then represent the final observation in a given i-th data sample in the form 

Ti = Ri{Tsky,i + Ni — j) , where Ni is the instrumental noise generated as described above. 

j is a reference temperature subtracted in the differential data and Ri is a constant 
which accounts for the calibration. Of course, the uncertainty on Ri and the non reduced 
time variation of j have to be much smaller than the Planck nominal sensitivity. Thus, 
we generate the "observed" map assuming a constant value, T^, of T^j for all the data 
samples. We note that possible constant small off-sets in could be in principle accepted, 
not compromising an accurate knowledge of the anisotropy pattern. We arbitrarily generate 
the "observed" map with Ri — R—lior all the data samples. 

3. A model of Acquisition Chain 

To test rigorously the efficiency of different compressors the best solution is to generate 
a realistically simulated signal for different mission hypotheses and apply to them the 
given compressors. To be realistical the simulation of the signal generation should contain 
both astrophysical and instrumental effects. It would be helpful that the final simulation 
would be able to given a hint about the influence of the various signal components and 
their variance. Of course it is useless to reproduce in full detail the LFI to obtain a signal 
simulation accurate enough to test compressors. A simplified model of the LFI, its front-end 
electronics and its operations will be enough. 

At the base of the simplified model is the concept of acquisition pipeline. This pipeline 
is composed by all the modules which process the astrophysical signal: from its collection 
to the production of the final data streams which are compressed and then sent to Earth. 
In the real LFI, the equivalent of the acquisition pipeline may be obtained following the 
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flow of tlie astropliysical information, from the telescope through the front-end electronics 
and the main Signal Processing Unit (SPU) to the memory of the Data Processing Unit 
(DPU) which is in charge to downlink it to the computer of the spacecraft and then to 
Earth. The acquisition pipeline is represented in figure |l|. Since its purpose is to describe 
the signal processing and its parameters it must not be regarded as a representation of the 
true on-board electronics since some functionalities may be shared between different real 
modules. In this scheme Front End operations of the true LFI are assigned to the first 
simulation level, while on-board processing and compression to the second one. 

The simulated microwave signal from the sky is collected and compared with the 
temperature of a reference load which, in our simulations, is supposed to have exactly the 
CMB temperature Tq = 2.725 K ( |lVlather et al. (1999)|) f\ The difference AT expressed 



in /iK is sampled along a scan circle producing a data stream of 60 scan circles with 8640 
samples (pointings). 



Signal detection is simulated by [Bersanelli et al. (1996)| , [Maris et al. (1998)] , |MariE| 
let al. (1999) 



Vont = AFO + VOT ■ AT, (3) 

where Kut is the detection chain output in Volts, VOT is the antenna temperature to 
the detector voltage conversion factor (— 0.5V/K < VOT < +1.5 V/K) while AFO is a 
detection chain offset (— 5V < AFO < +5V). Of course in our simulation this offset takes 
into account all offset sources, including variations of the reference temperature, and not 
only of the electrical offset. Similarly the VOT factor takes into account also differences 

^Alternatively, sky the reference-load signals may be sampled separately and then AT 
may be compute numericaly by the DPU. 
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among the different detectors wfiicfi affect tlie calibration of tfie temperature/ voltage 
relation. The range for VOT and AFO is large enough to include the whole set of nominal 
instrumental configurations, allowing also for somewhat larger and smaller values. 

The analog to digital conversion (ADC) is described by the formula: 



(adu) = trunc (2^bits . J^-^ ) , (4) 



where trunc(.) is the decimal truncation operator, iVbHs is the number of quantization bits 
produced by the ADC, while Vmin and Vmax are the lower and upper limits of the voltage 
scale accepted in input by the ADC. In our case: Ambits = 16 bits, Kiin = —10 V, Kiax = +10 
V. So the quantization unit "adu" (analog/digital unit) is 



1 adu = xf (5) 

2^bits 

or in terms of antenna temperature the quantization step is 



~ 2^bitsV0T 

for a typical VOT = 1 V/K, Ambits = 16 bits, 1 A ^ 3 x 10"^ K/adu. After digitization the 
simulated signal is written into a binary file of 16 bits integers and sent to the compression 
pipeline. 

The simplified LFI is composed of four acquisition pipelines, one for each frequency, 
each one being representative of the set of devices which form the full detection channel 
for the given frequency. The overall data-rate after loss-less compression for LFI should be 
obtained summing the contribution expected from each detector. Since in the real device 
each radiometer for a given frequency channel, will be characterized by different values of 
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VOT and AFO, the distribution of these parameters has to be taken in account computing 
the overall compression efficiency. In particular a greater attention should be devoted to the 
distribution of the VOT parameter since the compression efficiency is particularly sensitive 
to it. However, since the distribution of operating conditions and instrumental parameters 
are not yet fully defined, we assumed that all the detectors belongin to a given frequency 
channel are identical and located at the telescope focus. 



4. An Informal Theoretical Analysis About the Compression Efficiency 

An informal theoretical analysis may be helpful to evaluate the maximum lossless 
compression efficiency expected from LFI and to discuss the behaviour of the different 
compressors. For further details we remind the reader to [Nelson fc Gailly (1996) 



Data compression is based on the partition of a stream of bits into short chunks, 
represented by strings of bits of fixed length Ambits, and to code each string of bits into 
another string ^out whose length A^bit* is variable and, in principle, shorter than ^in. In 
this scheme, when the string of bits represents a message, the possible combinations of bits 
in Sin represents the symbols by which the message is encoded. From this description the 
compression operation is equivalent to map the input string set {Si^} into an output string 
set {Sout} through a compressing function J?-comp- A compression algorithm is called lossless 
when it is possible to reverse the compression process reconstructing the Sin string from 
^out through a decompression algorithm. So the condition for a compression programs to 
be lossless is that the related J-comp is a one-to-one application of {Sin} into {^out}- In this 
case the decompressing algorithm is the inverse function of J^comp- Of course in the general 
case it is not possible to have at the same time lossless compression and Ambits > Ambits 



'^But see section 



10 



and the appendix for a more detailed discussion. 
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any string in the input set. The problem is solved assuming that the discrete distribution 
P(S'in) of strings belonging to the input stream of bits is not fiat but that a most probable 
string exists. So a good J-comp will assign the shortest ^out to the most probable Sin and, 
the least probable the input string, the longest the output string. In the worst case output 
strings longer than the input string will be assigned to those strings of {Si^} which are 
least probable. With this statistical tuning of the compression function the final length of 
the compressed stream will be shorter than the original length, the averaged length of S'out 
being: 



^b°S*s= E ^(^In)A^brs(-^Comp(5ln)). (7) 
'S'lne{5'ln} 

Several factors affect the efficiency of a given compressor, in particular best 
performances are obtained when the compression algorithm is tuned on the specific 
distribution of symbols. Since the symbol distribution depends on A'bits and on the specific 
input stream, an ideal general-purpose self-adapting compressor should be able to perform 
the following operations: i) acquire the full bit stream (in the hypothesis it has a finite 
length) and divide it in chunks of length A'bits, H) perform a frequency analysis of the 
various symbols, in) create an optimized coding table which associates to each Sin a specific 
'S'outi iv) perform the compression according to the optimized coding table, v) send the 
coding table to the uncompressing program together with the compressed bit stream. The 
uncompressing program will restore the original bit stream using the associated optimized 
coding table. 

In practice in most cases the chunks size Ambits is hardwired into the compressing code 
(typically iVbits = 8 or 16 bits), also the fine tuning of the coding table for each specific 
bit stream is too expensive in terms of computer resources to be performed in this way, 
and the same holds for coding table transmission. So there are compressors which work as 
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if the coding table or, equivalently, the compression function is fixed. In this way the bit 
stream may be compressed chunk by chunk by the compressing algorithm which will act as 
a filter. Other compressors perform the statistical tuning on a small set of chunks taken at 
the beginning of the stream, and then apply the same coding table to the full input stream. 
In this case the compression efficiency will be sensitive to the presence of correlations 
between difference parts of the input stream. In this respect self-adaptive codes may be 
more effective than non-adaptive ones, if their adapting strategy is sensitive to the kind of 
correlations in the input stream. 

On the other hand other solutions may be adopted to obtain a good compromise 
between computer resources and compression optimization. For example all of the previous 
compressors are called static since the coding table is fixed in one way or the other at the 
beginning of the compression process and then used all over the input stream. Another big 
class of self-adaptive codes is represented by dynamical self-adaptive compressors, which 
gain the statistical knowledge about the signal as the compression proceeds changing time 
by time the coding table. Of course these codes compress worse at the beginning and 
better at the end of the data stream, provided its statistical properties are stationary. 
They are also able to self-adapt to remarkable changes in the characteristics of the input 
stream, but only if these changes may be sensed by the adapting code. Otherwise the 
compressor will behave worse than a well-tuned static compressor. Moreover, if the signal 
changes frequently, it may occur that the advantage of the dynamical self adaptability 
is compensated by the number of messages added to the output stream to inform the 
decompressing algorithm of the changes occurred to the coding table. Last but not least, 
if some error occurs during the transmission of the compressed stream and the messages 
about changes in the coding table are lost, it will be impossible to correctly restore it at 
the receiving station. This problem may be less severe for a static compressor since, as an 
example, it is possible to split the output stream in packets putting stop codes and storing 
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the coding table on-board until a confirmation message from the receiving station is sent 
back to confirm the correct transmission. 

It is then clear that each specific compression algorithm is statistically optimized for a 
given kind of input stream with its own statistical properties. So to obtain an optimized 
compressor for LFI it is important to properly characterize the statistics of the signal to 
be compressed and to test different existing compressors in order to map the behaviour of 
different compression schemes using realistically simulated signals and, as soon as possible, 
the true signals produced by the LFI electrical model. 

In order to evaluate the performances of different compression scheme we considered 
the Compression Rate defined as: 




where is the length of the input string in bytes and L^. is the length of the output 
string in bytes Q Other important estimators of to evaluate the performances of a given 
compression code are the memory allocation and the compression time. Both of them must 
be evaluated working on the final model of the on board computer. Since this component is 
not fully defined for the Planck/LFI mission, in this work we neglect these aspects of the 
problem. 

The measure represented by one of the 8640 samples which form one scan circle is 
white noise dominated, the r.m.s. (Tt being about a factor of ten higher then the CMB 
fluctuations signal. If so, at the first approximation it is possible to assume the digitized 
data stream from the front-end electronics as a stationary time serie of independent samples 

^Often compressors are evaluated looking at the compression efficiency r^c = 1/C*r but we 
considered Cr more effective for our purposes. 
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produced by a normal distributed white noise generator. In such situation symbols are 
represented by the quantized signal levels, and it is easy to infer the best coding table and 
by the information theory the expected compression rate for an optimized compressor is 
promptly estimated ([Gaztnaga et al. (1998)| ). In our notation, for a zero average signal: 



ln(v^ai/adu) + In VOT 
where ai is the r.m.s. of the sampled signal 0. 

From Eq. (|^) it is possible to infer that the higher is the VOT, (i.e. higher is the AT 
resolution) the worse is the compression rate, as already observed in [Maris et al. (1998 



[Maris et al. (1999]| . The reason being the fact that as VOT is increased the number 



of quantization levels (i.e. of symbols) to be coded is increased and their distribution 



becomes more fiat increasing N^^^. Assuming that all the white noise is thermal in origin 
cr^ ^ (Tt ~ 2 X 10^^ K. With the adu defined in equation together with the typical values 
of Vmin and V^max assumcd therein and Ambits = 16 bits we have Cj-^ ~ 11.09/ (3.30 + In VOT). 
In conclusion, for VOT = 0.5, 1.0, 1.5 V/K the Cj^ is respectively 4.26, 3.36, 3.00. In 
addition figure |^ represents the effect of a reduction of Abits on Cj^ compared to Cj^ for 
Ambits = 16. 



5. Statistical Signal Analysis 

A realistic estimation of the compression efficiency must be based on a quantitative 
analysis of the signal statistics, which includes: statistics of the binary representation 



^It has to be noted that eq. (|^) is an approximated formula which is rigorously valid 
when o";/adu ^ 1. 
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(section ^.1|) , entropy section and normality tests (section f).3\) . 

5.1. Binary Statistics 

Most of the off-the-shelf compressors considered here do not handle 16 bits words, but 8 
bits words. The 16 bits samples produced by the adc unit are splitted into two consecutive 
8 bits (1 byte) words labeled: most significant bits (MSB) word and least significant bits 
(LSB) word. To properly understand the compression efficiency limits it is important to 
understand the statistical distribution of 8 bits words composing the quantized signal from 
LFI. 

Figure |^ represents the frequency distribution of symbols when the full data stream of 
60 scan circles is divided into 8 bits words. Since for most of the samples the range spans 
over ^ 64 levels (5 bits) only the bytes corresponding to the MSB words assume a limited 
range of values producing the narrow spike in the figure. The belt shaped distribution 
at the edges is due to the set of LSB words. The distributions are quite sensitive to the 
quantization step, but do not change too much with the signal composition, the largest 
differences coming from the cosmological dipole contribution. 

From the distribution in figure § one may wonder if it would not be possible to obtain 
a more effective compression splitting the data stream into two substreams: the MSB 
substream (with compression efficiency C^^^) and the LSB substream (with compression 
efficiency C^^^). Since the two components are so different in their statistics, with the MSB 
substream having an higher level of redundancy than the original data stream, it would be 
reasonable to expect that the final compression rate 2/{l/C^^^ + 1/C^^^) be greater than 
the compression rate obtained compressing directly the original data stream. We tested this 
procedure taking some of the compressors considered for the final test. From these tests It 
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is clear that C^^^ >> Cr but since most of the redundancy of the original data stream is 
contained in the MSB substream the LSB substream can not be compressed in an effective 
way, as a result C^^^ < and 2/{l/C^^^ + l/C,^^^)<Cr- So the best way to perform an 
efficient compression is to apply the compressor to the full stream without performing the 
MSB / LSB separation. Apart from these theoretical considerations, we performed some 
tests with our simulated data stream confirming these result. 



5.2. Entropy Analysis 

Equation (P) is valid in the limit of a continuous distribution of quantization levels. 
Since in our case the quantization step is about one tenth of the signal rms this is no longer 
true. To properly estimate the maximum compression rate attainable from these data we 
evaluate the entropy of the discretized signal using different values of the VOT. 

Our entropy evaluation code takes the input data stream and determines the 
frequency fs of each symbol s in the quantized data stream and computing the entropy 
as: — J2s fs log2 fs where s is the symbol index. In our simulation we take both 8 and 16 
bits symbols (s spanning over 0, . . ., 255 and 0, . . ., 65535). Since in our scheme the ADC 
output is 16 bits, we considered 8 bits symbols entropy both for the LSB and MSB 8 bits 
word and 8 bits entropy after merging the LSB and MSB significant bits set. 

As expected, since AFO merely shifts the quantized signal distribution, entropy does 
not depend on AFO. For this reason we take AFO = V, i.e., no shift. 

Table ^ reports the 16 bits entropy as a function of VOT, composition and frequency. 
As obvious entropy, i.e. information content, increases increasing VOT i.e. quantization 
resolution. The entropy H distribution allows to evaluate the r.m.s. espected from 
different data streams realizations: 



- 19 - 



RMS(a) ^ a^^^. (10) 
H 

Since data will be packed in chuncks of finite length it is important not only to study 
the entropy distribution for the entire data-stream, which will give an indication of the 
overall compressibility of the data stream as a wall, but also the entropy distribution for 
short packets of fixed length. So each data stream was splitted into an integer number 
of chunks of fixed length /chunck- For each chunck the entropy was measured, and the 
corresponding distribution of entropies for the given Lchunck as its mean and rms was 
obtained. We take /chunk = 16, 32, 64, 135, 8640, 17280 16-bits samples, so each simulated 
8640 X 60 data stream will be splitted into 32400, 16200, 8100, 3840, 60, 30 chuncks. Small 
chunck sizes are introduced to study the entropy distribution as seen by most of the true 
compressors which do not compress one circle (8640 samples) at a time. Long chuncks 
distributions are usefuU to understand the entropy distribution for the overall data-stream. 

The entropy distribution per chunck is approximately described by a normal 
distribution (see figure so the mean entropy and its r.m.s. are enough to characterize 
the results. Not however that the corresponding distribution of compression rates is not 
exactly normally distributed, however for the sake of this analysis we will assume that even 
the Cf distribution is normally distributed. 

The mean entropy measured over one scan circle (/chunk = 8640 samples) coincides with 
the entropy measured for the full set of 60 scan circles, the entropy r.m.s. being of the order 
of 10~^ bits. Consequently the expected r.m.s. for Cr compressing one or more circles at a 
time will be less than 1%. 

The mean entropy and its rms are not independent quantities. Averaged entropy 
decreases as Lchunck decreases, but correspondingly the entropy r.m.s. increases. As a 
consequence the averaged Cr decreases decreasing Lchunck, but the fraction of chunks in 
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which the compressor performs significantly worst than in average increases. The overall 
compression rate, i.e. the Cr referred to the full mission, beeing affected by them. 



5.3. Normality Tests 

Since normal distribution of signals is assumed in ^ it would be interesting to fix 
how much the digitized signal distribution deviates from the normality. Also it would be 
important to characterize the infiuence of the 1/f noise and of the other signal components, 
especially the cosmic dipole, in the genesis of such deviations. To obtain an efficient 
compression it would be important that the samples are as more as possible statistically 
uncorrelated and normally distributed. In addition one should make sure that the detection 
chain does not cause any systematic effect which will introduce spurious non normal 
distributed components. This is relevant not only for the compression problem itself, which 
is among the data processing operations the least sensitive to small deviations from the 
normal distribution, but also in view of the future data reduction, calibration and analysis. 
For them the hypothesis of normality in the signal distribution is very important in order to 
allow a good separation of the foreground components. Last but not least, the hypothesis 
of conservation of normality along the detection chain, is important for the scientific 
interpretation of the results, since the accuracy expected from the Planck/LFI experiment 
should allow to verify if really the distribution of the CMB fiuctuations at Z>14 is normal, 
as predicted by the standard infiationary models, or as seems suggested by recent 4 years 
COBE/DMR results ( [Bromley fc Tegmark (199911 , [Ferreira, Gorsky, Magueijo (1999)|) . 



For this reason a set of normality tests was applied to the different components of the 
simulated signal before and after digitization in order to characterize the signal statistics 
and its variation along the detection process. Of course this work may be regarded as a 
first step in this direction, a true calibration of the signal statistics will be possible only 
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when the front end electronics simulator will be available. Those tests have furthermore the 
value of a preparation to the study of the true signal. 

Normality tests were applied on the same data streams used for data compression. 
Given on board memory limits, it is unlikely that more than a few circles at a time can be 
stored before compression, so statistical tests where performed regarding each data stream 
for a given pointing, as a collection of 60 independent realizations of the same process. Of 
course this is only approximately true. The 1/f noise correlates subsequent scan circles, but 
since its r.m.s. amplitude per sample is typically about one-tenth of the white noise r.m.s. 
or less, these correlations can be neglected in this analysis. 

Starting from the folded data streams a given normality test was applied to each set 
of 60 realizations for each one of the 8640 samples, transforming the stream of samples in a 
stream of test results for the given test. The cumulative distribution of frequency was then 
computed over the 8640 test results. Since 60 samples does not represent a large statistics, 
significant deviations from theorethically evaluated confidence levels are expected resulting 
in an excessive rejection or acceptation rates. For this reason each test was calibrated 
applying it to the undigitized white noise data stream. Moreover, in order to analyze how 
the normality evolves increasing the signal complexity, tests was repeated increasing the 
information content of the generated data stream. 

To simplify the discussion we considered as a reference test the usual Kolmogorow 
- Smirnov D test from Press et al. (1986] and we fix a 95% acceptance level. The test 



was "calibrated" using the MonteCarlo white noise generator of our mission simulator 
in order to fix the threshold level Dth as the D value for which more than 95% of our 
samples show D < Dth- From Table ^ the quantization effect is evident, at twice the 
nominal quantization step (VOT = 2 V/K) in 30% of the samples (i.e. 2592 samples) 
the distribution of realizations deviates from a normal distribution {D > -Dth)- Since the 
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theoretical compression rate from eq. (j^) is for a continuous distribution of levels {a ^ A) 
a smaller C-^ should is expected. Since the deviation from the normal distribution is a 
systematic effect, for the sake of cosmological data analysis one may tune the D test to 
take account of the quantization. As an example, the third line in Tab. ^ reports the 
threshold for the quantized signal for which 95% of the quantized white noise samples 
are accepted as normal distributed. The line below represents the success rate for the full 
quantized signal. After the recalibration the test is able to recognize that in 95% of the 
cases the signal is drawn from a normal distribution, but at the cost of a growth in the 
threshold D which now is a function of the quantization step A. 

As for the entropy distribution and the binary statistics, even in this case most of the 
differences between the results obtained for a pure white noise signal and the full signals 
are explained by the presence of the cosmological dipole. However these simulations are not 
accurate enough to draw any quantitative conclusions about the distortion in the sampling 
statistics induced by digitization, but they suggest that to approximate the instrumental 
signal as a quantized white noise plus a cosinusoidal term associated to the cosmic dipole is 
more than adequate in order to understand the optimal loss-less compression rate achievable 
in the case of the Planck/LFI mission. 



6. Quantization and Quantization Error 

A possible solution to solve the bandwidth problem is to reduce the amount of 
information of the sampled signal i.e. its entropy. Independently from the way in which 
this is performed, the final compression strategy will be lossy, and the final reconstructed 
(uncompressed) signal will be corrupted with respect to the original one, degrading in some 
regard the experimental performances. In this regard, any sort of lossy compression may be 
seen as a kind of signal rebinning with a coarser resolution (quantization step) in AT/T. 
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There are at least six aspects in Planck/LFI operations which may be affected by a 
coarser quantization: 

1. Ci and periodical signals reconstruction; 

2. destriping; 

3. foreground separation; 

4. point like sources detection; 

5. variable sources characterization; 

6. tests for normality of CMB fluctuations. 

Since the non linear nature of the quantization process, all of them are hard to be 
analytically evaluated and for this reason a specific simulation task is in progress for the 
Planck/LFI collaboration ( [White fc Seiffert (19991 , [Maris et al. (2000)|) . However an 



heuristic evaluation for the point (|T[) by analytical means is feasible. 

Quantization operates a convolution of the normal distribution of the input signal 
with the quantization operator (x : A) =sign(x)A * floor{\x/A\). If the quantization 
error: {x — {x : A)) is uniformly distributed its expectation is A/2 and its variance is 
A/-\/T2 ( [KoUar (1994)| ). Quantization over a large amount of samples may be regarded as 
an extra source of noise which will enhance the variance per sample. If the quantization 
error is statistically independent from the input quantized signal and if it may be added 
in quadrature to the white noise variance (Twn, the total variance per sample will be 



^^7v(l + ^^)- Sofor A< 

(TwN the expected quantization r.m.s. is <4%. From error 



propagation the relative error on the Ci is ( [Maino (1999)| ) 



so that the quantization contribution to the overall error will be small and dominated by the 
cosmic variance for a large set of /. However the application of such encouraging result must 
be considered carefully in a true experimental framework. Apart from the assumptions, 
it has to be demonstrated indeed that a large quantization error like this will not harm 
significantly the aforementioned aspects, moreover the impact of signal quantization will 
depend on how and in which point of the detection chain it will be performed. 



7. Experimental Evaluation of Off-The-Shelf Compressors 

This section describes the evaluation protocol and the experimental results of the 
compression of simulated data streams for Planck/LFI. 



7.1. Evaluation Protocol 



First tests were performed on a HP-UX workstation on four compressors (Maris 



let al. (1998)]) but given the limited number of off-the-shelf compression codes for such 



platform, we migrated the compression pipeline on a Pentium III based Windows/NT 
workstation. 

As described in section |^ the signal composition is defined by many components, both 
astrophysical and instrumental in origin. In particular, it is important to understand how 
each component or instrumental parameter, introducing deviations to the pure white noise 
statistics, affects the final compression rate. 

To scan systematically all the relevant combinations of signal compositions and 
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off-the-shelf compressors, a Compression Pipehne was created. The pipehne is based on five 
main components: the signal quantization pipeline, the signal database, the compression 
pipeline, the compression data base, the post-processing pipeline. The signal quantization 
pipeline performs the operations described in the upper part of figure |I]. The simulated 
astrophysical signals are hold in a dedicated section of the signal archive, they are processed 
by the quantization pipeline and stored back in a reserved section of the signal archive. So 
quantized data streams are generated for each relevant combination of the quantization 
parameters, signal composition and sky pointing. 

Each compressor is then applied by the compression pipeline to the full set of quantized 
signals in the signal archive. Results, in terms of compression efficiency as a function of 
quantization parameters are stored in the compression database. The statistical analysis of 
section ^ are performed with a similar pipeline. 

Finally the post-processing pipeline scans the compression data base in order to 
produce plots, statistics, tables and synthetic fits. Its results are usually stored into one of 
the two databases. 

The pipeline is managed by PERL 5.004 script files which drive FORTRAN, C, IDL 
programs or on-the-shelf utilities gluing and coordinating their activities. Up to ~ 75 000 
lines of repetitive code are required per simulation run. They are generated by a specifically 
designed Automated Program Generator (APG) written in IDL ([Maris fc Staniszkis(1998")l) . 



The APG takes as an input a table which specifies: the set of compressors to be tested, 
the set of quantization parameters to be used, the order in which to perform the scan of 
each parameter/compressor, the list of supporting programs to be used, other servicing 
parameters. The program linearizes the resulting parameter space and generates the PERL 
simulation code or, alternatively, performs other operations such as: to scan the results 
data base to produce statistics, plots, tables, and so on. The advantage of this method is 
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that a large amount of repetitive code, may be quickly produced, maintained or replaced 
with a minor effort each time a new object (compressor, parameter or analysis method) is 
added to the system. 

7.2. Experimental Results 

Purpose of these compression tests is to give an upper limit to the lossless compression 
efficiency for LFI data and to look for an optimal compressor to be proposed to the LFI 
consortium. 

A decision about the final compression scheme for Planck/LFI has not been taken yet 
and only future studies will be able to decide if the best performing one will be compatible 
with on-board operations (constrained by: packet independence and DPU capabilities) and 
will be accepted by the Planck/LFI collaboration. 

For this reason up to now only off-the-shelf compressors and hardware where 
considered. To test any reasonable compression scheme a wide selection of lossless 
compression algorithms, covering all the known methods, was applied to our simulated data. 
Lacking a comprehensive criteria to fix a final compressor, as memory and CPU constrains, 
we report in a compact form the results related to all the tested compressors. We are 
confident that in the near future long duration flight balloon experiments as on-board 
electronics prototypes will provide us with a more solid base to test and improve the final 
compression algorithms looking at real data. 

Tables ^, ^ list the selected compression programs. Since the behaviour (and efficiency) 
of each compressor is determined by a set of parameters one or more macro file operating a 
given combination of compressor code plus parameters is defined. It has to be noted that 
uses is a space qualified algorithm, based on Rice compression method, for which space 
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qualified dedicated hardware already exists. 

To evaluate the performances of each compressor, figures of merit are drawn like the 
one in figure ^ which shows the results for the best performing compressor: arith-nl. 
Looking at such figures it is possible to note as the compression efficiency does not depend 
much on the signal composition. This is true even when large, impulsive signals, as planets, 
affecting few samples over thousands are introduced. Again, this is a consequence of the 
fact that white noise dominates the signal, being the most important component to affect 
the compression efficiency. In this regard it has been speculated that the 1/f component 
should improve the correlation between neighborhood samples affecting the compression 
efficiency ( Maris et al. (195H] ) no relevant effect may be detected into our simulations. As 



an example from figure |^ for the 30 GHz signal the addition of the 1/f noise to the white 
noise data stream affects the final Cj- for less than 0.5%. 

The only noticeable (i.e. some 6%) effect due to an increase in the signal complexity, 
occurs when the cosmic dipole is added. In the present signal the dipole amplitude is 
comparable with the white noise amplitude (~ 3 mK) so its effect is to distort the sample 
distribution, making it leptocurtic. As a consequence compressors, which usually work 
best for a normal distributed signal, becomes less effective. Since the dipole introduces 
correlations over one full scan circle, i.e. some 10'^ samples, while compressors establish 
the proper coding table observing the data stream for a small set of consecutive samples 
(from some tens to some hundred samples), even a self adaptive compressor will likely loose 
the correlation introduced by the dipole. A proper solution to this problem is suggested in 
section ^. The other signal components do not introduce any noticeable systematic effect. 
The small differences shown by the figures of merit may be due to the compression variance 
and depend strongly on the compressor of choice. As an example a given compressor may 
be more effective to compress the simulated data stream with the full signal than the 



- 28 - 



associated simpler data stream containing only white noise, 1/f noise, CMB and dipole. At 
the same time another compressor may show an opposite behaviour. 

As shown by Figure ^ and as expected from eq. (^) increasing VOT, i.e. increasing 
the quantization step, increases the compression rate. In addition C-^ increases increasing 
Nc up to an ^ 20%. The increase is noticeable for N^. < 15 and saturates after = 30. 
On the contrary its dependence on the offset (AFO) is negligible (less than 1%). For these 
reasons in the subsequent analysis the AFO dependency is neglected and the corresponding 
simulations are averaged. 



7.3. Synthetical Description 

The full data base of simulated compression results takes about 14 MBytes, for 
practical purposes it is possible to synthesize all this information using a phenomenological 
relation which connects Cr with and VOT whose free parameters may be fitted using 
the data obtained from the simulations. In short: 



CfHVOT, A^c) = . rvoTi (^2) 



J(A^e)+5(Ar,)ln 



.1 V/K. 

where C,^i is the for = 1, VOT = 1.0 V/K, while J(A'c) and S{N^) describe the 
dependence on Nc. In particular the relation is calibrated for any compressor imposing that 
a(VOT = lV/K,Ar, = l) = C,,i. 

The linear dependency of 1/Cf'* over In VOT is a direct consequence of equation (P), 
and is confirmed by a set of tests performed over the full set of our numerical results for 
the compression efficiency, the r.m.s. residual between the best fit (|T^ and simulated 
data being less than 1.5%, in almost the 92% of the cases and less than 1% in 72% of the 
cases. The dependencies of its parameters I and S over Ac are obtained by a test-and-error 
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method performed on our data set and we did not investigate further on their nature. For 
all practical purposes our analysis shows that these functions are well approximated by a 
series expansion: 



here Si, and are free parameters obtained by fitting the simulated data, in particular 
Si is the slope for Nc = 1. 

Since an accuracy of some percent in determining the free parameters of Cf'*(VOT, Nc) 
is enough, the fitting procedure was simplified as follow. For a given compressor, signal 
component, swap status, and A^c value I and S where determined by a fitting procedure. 
The list of X and iS as a function of A^c have been fitted by using relations (|13D and 
(|l^) respectively. The fitting algorithm tests different degrees of the polynomial in the 
aforementioned relations (up to 2 for I{Nc), up to 5 for S{Nc)) stopping when the maximum 
deviation of the fitted relation respect to the data is smaller than 0.5% for X or 0.0001 for 
S, or when the maximum degree is reached. 

Tables H, 0, ^, | report the results of the compression exercise ordered for decreasing 
Cr^i. The first column is the name of compression macro (i.e. a given compression program 
with a defined selection of modificators and switches) as listed in tables: ^, |[ The third 
and fourth column are the fitted Cr,iand Si as defined in: (0), ([13|), (|1^). From the 5**^ 
to the 7^^ columns and from the 8**^ to the 13**^ columns the polynomial degree and the 
expansion parameters for ([T3|) and ( p!4D are reported. 

Many compressors are sensitive to the ordering of the Least and Most Significant Bytes 




(13) 




(14) 
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of a 16 bits word in the computer memory and files. Two ordering conventions are assumed: 
UnSwapped i.e. Least Significant Byte is stored First or Swapped i.e. Most Significant 
Byte is stored First. As in Digital VAX/VMS Operating System, Microsoft Windows/NT 
operating system convention is Most Significant Byte first. For this reason each test was 
repeated twice, one time with the original data stream file with swapped bytes and the 
other after unswapping bytes. If the gain in C^^i after unswapping is bigger than some 
percent, unswapped compression is reported, otherwise the swapped one is reported. These 
two cases are distinct by the second column of tables ^, |^, ^, |^ which is marked with a y 
if unswapping is applied before compressing. It is interesting to note that not only 16 bits 
compressors, such as uses, are sensitive to swapping. Also many 8 bits compressors are 
sensitive to it, maybe that this is due to the fact that if the most probable 8 bits symbol is 
presented first at the compressor a slightly better balanced coding table is built. 

It should be noted that the coefficients reported here are obtained compressing one 
or more full scan circles at a time, so their use to extrapolate Cr when each scan circle is 
divided in small chunks which are separately compressed has to be performed carefully, 
especially for VOT ^ 0.5 V/K where some extrapolated Cr grows instead of to decrease 
for a decreasing A^^c as in most of the cases. However we did not investigate further the 
problem because the time required to perform all the tests over all the compressors increases 
decreasing Nc, and because up to now a final decision about the packet length has not been 
made yet. Moreover, short data chunks introduce other constrains which are not accounted 
for by eq. (|) but which are discussed in section ^j. 

Apart from the choice of the best compressor. Tables ^, 0, H, ^ allows interesting 
comparisons. 

The performances of the arithmetic compression arith are very sensitive to changes in 
the coding order n = 0, . . ., 7. The computational weight grows with n, while Cj. is minimal 
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ai n — 0, maximal for n = 1 and decreases increasing n further. 

Both non- Adaptive Huffman (huf f-c) and Adaptive Huffman (ahuf f-c) are in the hst 
of the worst compressors, considering both the pure white noise signal and the full signal. 

We implemented the space-qualified uses compressor with a wide selection of 
combinations of its control parameters: the number of coding bits, the number of samples 
per block, the possibility to search for correlations between neighborhood samples. We 
report the tests for 16 bits coding only, changing the other parameters. Uses is very 
sensitive to byte unswapping, when not performed uses does not compress at all. On 
the other hand, opposite to arith the sensitivity of the final Cr to the various control 
parameters is small or negligible. In most cases Cr^i differs of less than 0.01 for changing 
the combination of control parameters, such changes are not displayed by the two digits 
approximation in the tables, but they are accounted for by the sorting procedure which fixes 
the table ordering. At 30 GHz most of the tested compressors cluster around Cr,i = 2.67 
and at this level arith-n3 is as good as uses. At 100 GHz the best uses macros clusters 
around Cr,i = 2.43 - 2.44, equivalent to arith-n2 performances. In our tests uses performs 
worst at 8 samples per block without correlation search, but apart from it, in our case 
the correlation search does not improve significantly the compression performances. Some 
commercial programs such as boa, bzip compress better than uses. 

8. Further Constrains: Packet Independence and Packet Length 

As an example of global constrains to the on-board compression we discuss the 
problems related to Packets Independence and Packets Length. 

Data from the LFI must be packetized before being sent to Earth. Packets independence 
is considered to be a requirement, then each packet must be self-consistent, its loss or its 
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erroneous transmission must not interfere with the data retrieval from subsequent packets. 
More over each packet must carry in "clear" format (i.e. uncompressed) all the information 
needed to decode its content. That is: each packet must contain its own decoding table 
or decoding information. A typical packet length is about some hundred of bytes, but 
smaller length may be planned if required; at the same time a typical decoding table holds 
something less than a hundred bytes leaving limited room for data. 

In addition, for a fixed length of a random input stream (expressed in bits) the 
output Lc will not be a constant but will change in time with respect to the averaged length 
Lu/Cr. Of course, it is not possible to predict in advance what will be the final length of 
a given bit stream. So either is held fixed, loosing in compression efficiency, or is 
adapted with some interactive method, maximizing the compression efficiency but at the 
cost of a significant slowing of the compression process. 

In conclusion, the packets independence plus limited packet length prevents from 
sending the decoding table, leaving only two possibilities open: i) send the relevant bytes 
only ( [Maris (1999a)| ), ii) to use a predefined coding table ( [Maris (1999b)|) , both methods 



are described in the next section. 



9. Proposed Coding and Compression Scheme 

The basic principle of the first method named Least Significant Bits Packing (LSBP) is 
to send only those bits of the 16 bits output from the ADC which are affected by the signal 
and the noise. This is effective for the nominal mission since with the planned quantization 
step of 0.3 mK/adu, at one sigma the noise will fill about 21 levels, this will require at least 
5 bits over 16 and it is reasonable to expect a final data fiow equivalent to C-^^i < 3. It is 
not possible to improve much the compression rate by compressing the resulting 5 bits data 
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stream, since its entropy would he H < 5.4 bits and Cr<1.08. 

In order to ensure the compression to be lossless all the samples exceeding the [—a, 
+a] (5 bits) range have to be sent separately coding at the same time: their position 
(address) in the stream vector and their value. So, for Ambits < 16 bits corresponding to a 
threshold Xth = 2^tiits^ each group of samples stored into a packet is partitioned into two 
classes accordingly with their value x: 

Regular Samples (RS) (def) all those samples for which: |x| < Xth, 



Spike Samples (SS) (def) all those samples for which: |x| > Xth- 

The coding process then consists of two main steps: i) to split the data stream in 
Regular and Spike Samples preserving the original ordering in the stream of Regular 
Samples, ii) to store (send) the first Ambits bits of the regular samples and, in a separated 
area, the 16 bits values and the location in the original data stream of each Spike Sample, 
i.e. Spike Samples will require more space to be stored than regular ones. The decoding 
process will be the reverse of this packing process. 

In this scheme each packet will be divided into two main areas: the Regular Samples 
Area (RSA) which hold the stream of Regular Samples, the Spike Sample Area (SSA) 
which hold the stream of Spike Samples, plus a number of fields which will contain packing 
parameters such as: the number of samples, the number of regular samples, the offset, etc. 
Since the number of samples in each area will change randomly it will be not possible to 
completely fill a packet. The filling process will leave an empty area in the packet in average 
smaller than Ambits- 

In [Maris (1999a)| a first evaluation for the 30 GHz channel is given assuming that the 



signal is composed only of white noise plus the CMB dipole. As noticed in section 7.2 the 
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cosmological dipole affects tfie compression efficiency reducing it of a small amount. To 
deal with it a possible solution would be to subdivide each data stream in packets, subtract 
to each measure of a given packet the integer average of samples (computed as a 16 bits 
integer number) and then compress the residuals. Each integer average will be sent to 
Earth together with the related packet where the operation will be reversed. Since all the 
numbers are coded as 16 bits integers all the operations are fully reversible and no round off 
error occurs. However it cannot be excluded that the computational cost of such operation 
will compensate the gain in C^. 

Two schemes are proposed to perform the cosmological dipole self-adaptement. In 
Scheme A the average of samples in the packet are subtracted before coding and then sent 
separately. In Scheme B Xth is varied proportionally to the dipole contribution. Both of 
them assumes that the dipole contribution is about a constant over a packet length. From 
this assumption: Lp<200 samples i.e. Lp < 512 bytes, since for Lp > 512 bytes the cosmic 
dipole contribution can not be considered as a time constant. For larger packets a better 
modeling (i.e. more parameters) will be required in order not to degrade the compression 
efficiency. 

A critical point is to fix the best Xth, i.e. A^wts, for a given signal statistics, coding 
scheme and packet length Lp. Even here Cr grows with the packet length but it does not 
change monotously with Xth- An increase in Xth (Abits) decreases the number of spike 
samples, but increases the size of each regular sample. While the opposite occurs when 
Xth is decreased, and when Abits < 4 bits < 1. For both the schemes the optimality 
is reached for Abits = 6 bits, but Scheme A is better than B, with: Ci{Scheme A, 
Lp = 512 bytes) = 2.61, C,{Scheme 5, Lp = 512 bytes) = 2.29. 

Compared with arith-nl, this compression rate is smaller of about a 14 - 30%. This is 
due to two reasons: i) coding by a threshold cut is less effective than to apply an optimized 
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compressor; ii) the results reported in tables H, ^ refer to the compression of a full circle 
of data instead of a small packet, resulting in a higher efficiency. However, the efficiency 
of this coding method is similar to the efficiency of the bulk of the other true loss-less 
compressors tested up to now, and when the need to send a decoding table is considered, is 
even higher. 

The second possible solution to the packeting problem is to use one or more 
standardized coding tables for the compression scheme of choice ( [Maris ( 1999b )|) . In this 
case the coding table would be loaded into the on-board computer before launch or time by 
time in flight and the table should be known in advance at Earth. Major advantages would 
be: 1. the coding table has not to be sent to Earth; 2. the compression operator will be 
reduced to a mapping operator which may be implement as a tabular search, driven by the 
input 8 or 16 bits word to be compressed; 3. any compression scheme (Huffman, arithmetic, 
etc.) may be implemented replacing the coding table without changes to the compression 
program; 4. the compression procedure may be easily written in C or the native assembler 
language for the on-board computer or, alternatively, a simple, dedicated hardware may be 
implemented and interfaced to the on-board computer. The disadvantages of this scheme 
are: 1. each table must reside permanently in the central computer memory unless a 
dedicated hardware is interfaced to it; 2. it is difficult to use adaptive schemes in order to 
tune the compressor to the input signal, as a consequence the may be somewhat smaller 
than in the case of a true self-adapting compressor code. 

The first problem may be circumvented limiting the length of the words to be 
compressed. In our case the data streams may be divided in chunks of 8 bits and the typical 
table size would be ;Sl Kbyte. Precomputed coding tables may be accurately optimized by 
Monte-Carlo simulations on ground or using signals from ground tests of true hardware. 

The second problem may be overcome by using a preconditioning stage, reducing the 
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statistics of the input signal to the statistics for which the pre-calculated table is optimized. 
In addition more tables may reside in the computer memory and selected looking to the 
signal statistics. With a simple reversible statistical preconditioner, about ten tables per 
frequency channel would be stored in the computer memory, so that the total memory 
occupation would be less than about 40 Kbytes. It cannot be excluded that the two 
methods just outlined cannot be merged. 



10. Estimation of the Overall Compression Rate 

The overall compression rate (efficiency) is the average of Cr {rjc) over the full set of 
detectors. Appendix ^ illustrates the mathematical aspects of such average. From ([A4|) : 



(15) 



We will limit ourselves to the most probable case Nc = 1 and to the most effective 
compressor arith-nl. The compression parameters Cj-^i and Si at 30 GHz and 100 GHz 
are derived from our simulations, while Cr i and Si at 44 GHz and 70 GHz are obtained by 
linear interpolation of the simulated values as a function of lno"i,. After that we obtain: 

^-^^ (W) 
' 1 + 0.271 X In VOT' ^ ' 

As expected the overall compression rate is dominated by the 100 GHz channel. Taking 
in account the conservative VOT distribution considered in equation (|A8|) the overall 
compression rate becomes: Cj- ~ 2.63 which represents a ~ 2% correction only. It is likely 
that this correction will be even smaller, since the amplifiers gain will be adjusted in order 
to cover a smaller VOT interval. So this 2% correction represents our greatest uncertainty 
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in our estimation of the expected compression rate, and we may conservatively conclude 
that: 

a,arith-nl~ 2.65 ±0.02 (17) 

Recently a new evaluation of the expected instrumental sensitivity leads to some change in 
the expected white noise r.m.s.. These changes affect in particular the 30 GHz channel, but 
does not change significantly the 100 GHz channel so that the overall compression rate will 
be practically unaffected. 

11. Conclusions 

The expected data rate from the Planck Low Frequency Instrument is ~ 260 kbits/sec. 
The bandwidth for the scientific data download currently allocated is just ~ 60 kbit/sec. 
Assuming an equal subdivision of the bandwidth between the two instruments on-board 
Planck, an overall compression rate of a factor 8.7 is required to download all the data. 

In this work we perform a full analysis on realistically simulated data streams for the 
30 GHz and 100 GHz channels in order to fix the maximum compression rate achievable by 
loss-less compression methods, without considering explicitly other constrains such as: the 
power of the on-board Data Processing Unit, or the requirements about packet length limits 
and independence, but taking in account all the instrumental features relevant to data 
acquisition, i.e.: the quantization process, the temperature / voltage conversion, number of 
quantization bits and signal composition. 

As a complement to the experimental analysis we perform in parallel a theoretical 
analysis of the maximum compression rate. Such analysis is based on the statistical 
properties of the simulated signal and is able to explain quantitatively most of the 



- 38 - 



experimental results. 

Our conclusions about the statistical analysis of the quantized signal are: I) the 
nominally quantized signal has an entropy h ^ 5.5 bits at 30GHz and h ^ 5.9 bits at 
lOOGHz, which allows a theoretical upper limit for the compression rate ~ 2.9 at 30 Ghz 
and ~ 2.7 at 100 GHz. II) Quantization may introduce some distortion in the signal 
statistics but the subject requires a deepest analysis. 

Our conclusions about the compression rate are summarized as follows: I) the 
compression rate is affected by the quantization step, since greater is the quantization 
step higher is C^- (but worse is the measure accuracy). II) C,- is affected also by the stream 
length Lu, i.e. more circles are compressed better then few circles. Ill) the dependencies 
on the quantization step and Lu for each compressor may be summarized by the empirical 
formula (|T^. A reduced compression rate Cr,i is correspondingly defined. IV) the Cr is 
affected by the signal composition, in particular, by the white noise r.m.s. and by the dipole 
contribution, the former being the dominant parameter and the latter influencing Cr for less 
than ^ 6%. The inclusion of the dipole contribution reduces the overall compression rate. 
The other components (1/f noise, CMB fluctuations, the galaxy, extragalactic sources) have 
little or no effect on C^. In conclusion, for the sake of compression rate estimation, the 
signal may be safely represented by a sinusoidal signal plus white noise. V) since the noise 
r.m.s. increases with the frequency, the compression rate Cj- decreases with the frequency, 
for the LFI ACj-/Cj-^10%. VI) the expected random r.m.s. in the overall compression rate 
is less than 1%. VII) we tested a large number of off-the-shelf compressors, with many 
combinations of control parameters so to cover every conceivable compression method. The 
best performing compressor is the arithmetic compression scheme of order 1: arith-nl, the 
final Cj-^i being 2.83 at 30 GHz and 2.61 at 100 GHz. This is significantly less than the bare 
theoretical compression rate (Q) but when the quantization process is taken properly into 
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account in the theoretical analysis, this discrepancy is largely reduced. VIII) taking into 
account the data flow distribution among different compressors the overall compression rate 
for arith-nl is: 



a.arith-nl ~ 2.65 ± 0.02 

This result is due to the nature of the signal which is noise dominated and clearly excludes 
the possibility to reach the required data flow reduction through loss-less compression only. 

Possible solutions deal with the application of lossy compression methods such as: 
on-board averaging, data rebinning, or averaging of signals from duplicated detectors, in 
order to reach an overall lossy compression of about a factor 3.4, which coupled with the 
overall loss-less compression rate of about 2.65 should allow to reach the required flnal 
compression rate ~ 8.7. However each of these solutions will introduce heavy constraints 
and important reduction of performances in the flnal mission design, so that careful and 
deep studies will be required in order to choose the best one. 

Another solution to the bandwidth problem would be to apply a coarser quantization 
step. This has however the drawback of reducing the signal resolution in terms of AT/T. 

Lastly the choice of a given compressor cannot be based only on its efficiency obtained 
from simulated data, but also on the on-board available CPU and on the official ESA 
space qualification: tests with this hardware platform and other compressors will be made 
during the project development. Moreover, in the near future long duration flight balloon 
experiments and ground experiments (see [Lasenby et al. (1998)| , Pe Bernardis fc Masi 



1998)1) will provide a solid base to test and improve compression algorithms. In addition 



the flnal compression scheme will have to cope with requirements about packet length 
and packet independence. We discuss briefly this problems recalling two proposals ( |lVlarii 
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(1999b), [Maris (1999a)|) which suggest solutions to cope with these constrains. 



A. Appendix: Formulation of the Final Data Flow 

In this appendix we will discuss how to account for the distribution of the acquisition 
parameters between the different detectors in the computation of the overall compression 
rate. Since the formalism is simpler we will develop expressions for 7]^ = l/C-^ instead of C^. 



We have pointed out in [5.2| that the compression efficiency is a random variable, 



whose distribution is a function of all those parameters which are relevant to fix the 
statistical distribution of the input signal. In our case: z/, VOT, AFO, Ncnc are the relevant 
parameters, so that the conditioned probability to have a compression efficiency in the 
range t^c, ?7c + c??7c is: 



P,,^,^,Jr/,|AFO,VOT)rfr/c. (Al) 

This probability may be obtained by our MonteCarlo simulations for different combinations 
of AFO, VOT, A^circ and v. Then the averaged compression efficiency is: 

fJ,,,N^^^^ (AFO, VOT) = dvc Vc P.,ivc.. (r/c I AFO, VOT) . (A2) 

Of course we assumed that for any z/, VOT, AFO, A^'circ the probability distribution is 
integrable and normalized to 1, while the integration limits 0, +oo are to be intended as 
formal. There are several detectors for any frequency channel, each one having its own 
AFO and VOT, so distributions of AFO and VOT values may be guessed among the 
different detectors. Assuming they are integrable and normalized to 1 as well it is possible 
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to compute the most probable r]^^ ^^.^^ as []; 

/•AFOniax /-VOTmax 

Vcu,Nc..= L rfAFOP.(AFO) / rfVOTP.(VOT)rJ,,,^^^^JAFO,VOT). (A3) 

111 

With this definition the final overall compression efficiency is: 

y=30,44,70,100GHz 

where f,y is the partition function for the data ffow through the different detectors, 
if n^tcu is the number of detectors for the frequency channel u (see Tab. I), 
'^dtc = Z]v=3o,44,70,iooGHz ''^dtc,!/ = 112, is the total number of detectors and if the 
number of samples for frequency is a constant, then: 

/. = (A5) 

so that for u = 30, 44, 70 and 100 GHz respectively: = 0.0714, 0.1071, 0.2143 and 0.6071, 
finally the expect data rate for each set of 60 circles is: 



-^A^circ = 16bits X 60 circles x 8640 samples x 112 detectors x rjj^^''"'. (A6) 

Presently there are no data to know in advance the distribution of VOT and AFO values 
between the different detectors. For this reason in this work we assumed simply ffat 



''Here 



(iAFOP^(AFO) = 1, / dVOTP^(VOT) = 1 

AFO„,in -'VOTn.in 
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distributions, identical for each frequency for such parameters. More over, the AFO 
contribution is neghgible, so that the variance introduced by this parameter is neglected. 
From (P) we assumed that the compression efficiency is approximately a linear function of 
InVOT or: 



^c.,^c.. ( VOT) ^ + t.,Na,^^ In VOX (A7) 

where Vcu,Ncirc ^^^^ derivative of fJpj^^^^.^^(VOT) with respect to InVOT computed 

for VOX = 1 V/K, r7c,.,jvci.c,i = "ncu,NaS^'^^ = 1 V/K). As an example, at 30 GHz for 
arith-nl the full signal compression rate is rJ^^^^^.^jVOT) ^ 0.3534 + 0.287 x In VOT(K/V) 
with one interpolation error less than 0.2%. With these approximations eq. ( |A^ ) becomes 



_ _ ^ /-Lsv/K InVOT 



and after integration we obtain the final formula 



%.,No.. ^ ^c.,^e.c,i - 0-045229 ■ ij,,,^^^^^ (A9) 

for the case in the previous example: fjcuNc- =2 ~ 0.3404 which is equivalent to a 
compression efficiency ~ 2.94. 

To understand the influence of the error in the VOT determination over the distribution 
on the flnal predictions the computation is made for a truncated (i.e. zero outside the VOT 
range of interest) normal distribution of VOT. The r.m.s. for the VOT distribution is 
chosen in the VOT range [0.5, 1.5] V/K we obtain respectively ricuNc- ^ 0.3494, 0.3439, 
0.3420; which corresponds to compression efficiencies: 2.86, 2.91, 2.92 respectively. Similar 
results are obtained with a quadratic VOT distribution. In conclusion these predictions 
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are robust against the shape of the VOT distribution, at least for distributions which are 
symmetric around the nominal VOT = 1 V/K value. 

We warmly acknowledge a number of people which actively support this work with 
fruitful discussions, in particular F. Argiieso, M. BersaneUi, L. Danese, G. De Zotti, E. 
Gaztnaga, J. Hcrrera, N. Mandolesi, P. Platania, A. Romeo, M. Seiffert and L. Toffolatti 
and K. Gorski and all people involved in the construction of the Healpix pixelisation tools, 
largely employed in this work, and Dr. G. Lombardi from Siemens Bochun - Germany 
and Dr. G. Maris from ETNOTEAM - Milano for fruitful discussions about compression 
principles and their practical application. 
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Table 1: Summary of LFI characteristics as recently revised by the LFI Consortium 
( pVlandolesi et ai. (1999) ). Data rates are tabulated for the case of a sampling rate equal to 



8640 samples per circle and constant time and frequency. 



Center frequency v [GHz] 


30 


44 


70 


100 


Number of detectors ndtc,!/ 


8 


12 


24 


68 


Angular resolutions, FWHM ['] 


33.6 


22.9 


14.4 


10.0 


Bandwidth [Az//z/] 


0.2 


0.2 


0.2 


0.2 


lO^AT/T 


1.6 


2.4 


3.6 


4.3 


AT,„i [/iK] 


5.1 


7.8 


10.6 


12.4 


ATq^j [mK] per sampling and receiver 


2.06 


2.61 


3.16 


4.36 


Number of samples for beam 


13.4 


9.2 


5.8 


4.0 


Data rate for detector [Kb/sec] 


2.3 


2.3 


2.3 


2.3 


Data rate for frequency [Kb/sec] 


18.4 


27.6 


55.3 


156.7 


Uncompressed data rate partition function [%] 


7.14 


10.71 


21.43 


60.71 
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Table 2: Entropy for 16 bits samples at 30 and 100 GHz, for only White Noise and Full 
Signal as a function of Lchunck- Total Entropy refers to the entropy computed over the full 
set of samples (8640 x 60), Mean and RMS Entropy are the mean and RMS of different 
realizations of chunks of samples of length I/chunck- The same for Cj- columns. Here are 
derived from the corresponding values of the entropy. The quantization step is A = 0.305 
mK/adu. 

30 GHz, White Noise 





Entropy (bits) 








-^'chunck 


Total 


Mean 


RMS 


Total 


Mean 


RMS 


16 


5.1618 


3.5596 


0.1989 


3.10 


4.49 


0.251 


32 


5.1618 


4.1815 


0.1658 


3.10 


3.83 


0.152 


64 


5.1618 


4.6108 


0.1262 


3.10 


3.47 


0.095 


135 


5.1618 


4.8791 


0.0890 


3.10 


3.28 


0.060 


8640 


5.1618 


5.1561 


0.0114 


3.10 


3.10 


0.007 


17280 


5.1618 


5.1589 


0.0061 


3.10 


3.10 


0.004 


30 GHz, Full Signal 




Entropy (bits) 








-^'clmrick 


Total 


Mean 


RMS 


Total 


Mean 


RMS 


16 


5.5213 


3.5602 


0.1982 


2.90 


4.49 


0.250 


32 


5.5213 


4.1849 


0.1664 


2.90 


3.82 


0.152 


64 


5.5213 


4.6162 


0.1278 


2.90 


3.47 


0.096 


135 


5.5213 


4.8885 


0.0893 


2.90 


3.27 


0.060 


8640 


5.5213 


5.5119 


0.0176 


2.90 


2.90 


0.009 


17280 


5.5213 


5.5157 


0.0118 


2.90 


2.90 


0.006 


100 GHz, White Noise 




Entropy (bits) 








-^chunck 


Total 


Mean 


RMS 


Total 


Mean 


RMS 


16 


5.7436 


3.6962 


0.1740 


2.79 


4.33 


0.204 


32 


5.7436 


4.4174 


0.1521 


2.79 


3.62 


0.125 


64 


5.7436 


4.9627 


0.1230 


2.79 


3.22 


0.080 


135 


5.7436 


5.3354 


0.0875 


2.79 


3.00 


0.049 


8640 


5.7436 


5.7352 


0.0115 


2.79 


2.79 


0.006 


17280 


5.7436 


5.7394 


0.0063 


2.79 


2.79 


0.003 



100 GHz, Full Signal 



Entropy (bits) 
-f'chunck Total Mean RMS 



Total 



Mean 



16 5.8737 

32 5.8737 

64 5.8737 

135 5.8737 

Sfizin ^ ».7^7 



3.6970 
4.4186 

4.9655 
5.3419 



0.1734 2.72 

0.1526 2.72 

0.1224 2.72 

0.0887 2.72 

n msn 9 79 



4.33 
3.62 
3.22 
3.00 

9 



RMS 



0.203 
0.125 
0.079 
0.050 
n nns 



Table 3: Quantization Effect on tlie Kolmogorow 
A is tlie quantization step. 



- Smirnov D test applied to simulated data, 



A (mK/adu) 
1.220 0.610 0.406 
r{D < 0.1475, White Noise) 0.28 0.70 0.84 
J^(D < 0.1475, Signal) 0.27 0.71 0.86 
D^s 0.2449 0.1851 0.1678 

T{D < L)^5, Signal) 0.95 0.95 0.95 
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Table 4: Tested compressors and related parameters. The Macro column contains the names 
of the macros running a given compression Code with a given combination of Parameters 



Macro 



Code 



Parameters 



Note 



ahuff-c ahuff-c 



Adaptive Huffman [Nelson fc Gailly (1996) 



AR 



ar 



arc 



arc 



arha 

arhaASC 
arhaHSC 



arhangel 



-1 
-2 



|http: / /www.geocities. com/Silicon Valley/Lab/6606| 
ASC method 
HSC method 



arith-c 



arith-c 



Arithmetic coding [Nelson fc Gailly (1996 



arith-n 



arith-n 



Adaptive Arithmetic Coding (AC) [Nelson fc Gailly (1996" 



arith-nO " 


-o 


Zeroth order Arithmetic coding 


arith-nl " 


-o 1 


First order AC 


arith-n2 " 


-o 2 


Second order AC 


arith-n3 " 


-o 3 


Third order AC 


arith-n4 " 


-o 4 


Fourth order AC 


arith-n5 " 


-o 5 


Fifth order AC 


arith-n6 " 


-o 6 


Sixth order AC 


arith-n7 " 


-o 7 


Seventh order AC 


axj arj 






arjO " 


-m 


method (no compression) 


arjl " 


-m 1 


method 1 


arj 2 " 


-m 2 


method 2 


arj 3 " 


-m 3 


method 3 


arj 4 " 


-m 4 


method 4 



boa 


boa 






bzip 


bzip2090 






bziprb 




-repetitive-best 


best compression of repetitive blocks 


bziprf 


)) 


-repetitive-fast 


fast compression of repetitive blocks 


gzipl 


gzip 


-1 


fast compression 


gzip9 


It 


-9 


best compression 


huff-c 


huff-c 




Hauffman Nelson & Gailly (1996) 




jar 


jar32 






jarl 


)) 


-ml 


method 1 


jar 2 


n 


-m2 


method 2 


jar 3 


n 


-m3 


method 3 


jar 4 


n 


-m4 


method 4 


lha 


lha 






Izss 


Izss 






lzwl2 


lzwl2 
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Table 5: See table B. 



Macro 


Code 


Parameters 


Note 


pkzip 


pkzip 






from PKWARE 


pkzip-ef 


55 


-ef 




fast compression 


pkzip-en 


)5 


-en 




normal compression 


pkzip-es 


)5 


-es 




super fast compression 


pkzip-ex 




-ex 




extra compression 


rar-mO 


rar 


-mO 




level compression 


rar-ml 


)5 


-ml 




level 1 compression 


rar-m2 


)) 


-m2 




level 2 compression 


rar-m3 


)) 


-m3 




level 3 compression 


rar-m4 


)) 


-m4 




level 4 compression 


rar-m5 


77 


-m5 




level 5 compression 


splint 


splint 








SZIPOO 


szip 






Rice Algorithm and Rice compression chip simulator 


szipOec 




-ec 




entropy coding compression mode 


szipOnu 




-nn 




nearest neighbor compression mode 


szipcO 




-chip 




compress exactly as chip 


SZIPCEC 




-chip 


-ec 


as szipOec -|- chip compression 


SZIPCNU 




-chip 


-nn 


as szipOnu -|- chip compression 


uses 


uses 


-n 16 


-s 64 -rr 


Universal Source Encoder for Space 



usesOOS 

usesOOSrr 

uses016 

uses016rr 

uses032 

uses032rr 

uses064 

uses064rr 

uses320 

uses320rr 

uses960 

uses960rr 



-n 


16 


-s 


8-j 8 


-n 


16 


-s 


8 -rr -j 


-n 


16 


-s 


16 


-n 


16 


-s 


16 -rr 


-n 


16 


-s 


32 


-n 


16 


-s 


32 -rr 


-n 


16 


-s 


64 


-n 


16 


-s 


64 -rr 


-n 


16 


-s 


320 


-n 


16 


-s 


320 -rr 


-n 


16 


-s 


960 


-n 


16 


-s 


960 -rr 



16 bits per sample, 

64 samples for scanline, 

correlates near samples (CNS) 

8 samples, 8 samples per block 

as uses008 + CNS 

16 samples per block 

16 samples per block -|- CNS 

32 samples per block 

32 samples per block -|- CNS 

64 samples per block 

64 samples per block -|- CNS 

320 samples per block 

320 samples per block -|- CNS 

960 samples per block 

960 samples oer block -I- CNS 
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Table 6: Compression Rates at 30 GHz, white noise only 



A/I ovr\ 


Swap 






6'l 


J-/ 




Ax 




^2 


J-/ 












B3 




Bi 






cLriui-iii 




3.UU 


0.293 




L 


-0.0197 


0.00314 


V. 



-0.07749 


0.03629 


-0.00978 


0.00153 


-0.00010 


R7TP 


y 


2 


85 





268 


9 

z 


-0 


0169 





00273 


c 



-0 


06467 





06598 


-0 


03615 





00918 


-0 


00084 


bziprb 


y 


2 


85 





268 


9 

z 


-0 


0169 





00273 





-0 


06467 





06598 


-0 


03615 





00918 


-0 


00084 


bziprf 


y 


2 


85 





268 


9 
Z 


-0 


0169 





00273 


5 


-0 


06467 





06598 


-0 


03615 





00918 


-0 


00084 


arith-n2 




2 


82 





324 


9 
Z 


-0 


0453 





00601 


Q 

o 


-0 


10166 





01394 


-0 


00060 





00000 





00000 


boa 


y 


2 


81 





247 


i 


-0 


0129 





00000 


r 







06445 


-0 


08272 





03984 


-0 


00629 





00023 




y 


2 


68 





281 


9 
Z 


-0 


0367 





00525 


IT 







13641 


-0 


22929 





14843 


-0 


03812 





00343 


arha 


y 


2 


68 





281 


9 
Z 


-0 


0367 





00525 


r 







13641 


-0 


22929 





14843 


-0 


03812 





00343 


uses320rr 


y 


2 


68 





241 


n 
U 





0000 
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Table 7: Compression Rates at 30 GHz, full signal 
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Table 8: Compression Rates at 100 GHz, white noise only 
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Fig. 1.- 



Scheme for the functional model of the acquisition pipeline. 
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Fig. 2.— Cj'^ as a function of VOT and Ambits- It is assumed Kiin = -10 V, V^^ = +10 
V and ai — 2 X 10~^ K. The curve for 12 bits is scaled by a factor 0.1 to allow a better 
comparison with the 16 bits curve. 
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Fig. 3. — Statistical distribution of 8 and 16 bits words for LFI simulated signals. Upper 
row is for 30 GHz, lower row for 100 GHz. Left column are the distributions of 16 bits words 
from the quantized signals, right column is for 8 bits words from the quantized signal, full 
line is the distribution for pure white noise, dashed line is the distribution for the full signal. 
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Fig. 4. — The entropy distribution per bunch for /chunk = 64 samples, for the full signal at 
30 GHz. 
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30 GHz - vs Signal composition 100 GHz - G^ vs Signal connposition 
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Fig. 5. — Figures of merit for the arithmetic compression of order 1 (arith-nl) for 30 GHz 
and 100 GHz channels. Here AFO = V, VOT = 1.0 V/K, A^c = 1- The compression 
efficiency is plotted as a function of the incremental complexity of the signal composition: 
wn means white noise only, +l/f plus 1/f, +cmb plus CMB, +dipo plus dipole, +gal plus 
galaxy, +ex plus extragalactic sources. 
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30 GHz - Full signal and noise 100 GHz - Full signal and noise 
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Fig. 6. — Compression rates for arith-nl as a function of the VOT and A^c for a full 
simulated signal (wn + 1/f + dp + cmb + dipo + gal + ex) (see also figure ^ for details). 
From top to bottom: Squares: VOT = 1.5 V/K, Diamonds: VOT =1.0 V/K, Triangles: 
VOT = 0.5 V/K. 



